Data Extraction

Extract structured data from pages using CSS or XPath selectors.

Basic extraction

result = client.scrape(
    "https://quotes.toscrape.com/",
    extract={
        "title": "css:h1",
        "first_quote": "css:.text",
    },
)
print(result.extracted_data["title"])
print(result.extracted_data["first_quote"])

Multiple values

Use multiple: True to extract all matching elements as a list:

result = client.scrape(
    "https://quotes.toscrape.com/",
    extract={
        "quotes": {"selector": "css:.text", "multiple": True},
        "authors": {"selector": "css:.author", "multiple": True},
    },
)
for quote, author in zip(result.extracted_data["quotes"], result.extracted_data["authors"]):
    print(f"{quote} — {author}")

Extract attributes

Extract element attributes like href, src, data-*:

result = client.scrape(
    "https://quotes.toscrape.com/",
    extract={
        "links": {"selector": "css:a", "attribute": "href", "multiple": True},
        "images": {"selector": "css:img", "attribute": "src", "multiple": True},
    },
)

XPath selectors

result = client.scrape(
    "https://quotes.toscrape.com/",
    extract={
        "quotes": "xpath://span[@class='text']",
        "authors": "xpath://small[@class='author']",
    },
)

Extraction + browser

Works with browser rendering for JS-generated content:

result = client.scrape(
    "https://spa-app.com/products",
    browser=True,
    extract={
        "names": {"selector": "css:.product-name", "multiple": True},
        "prices": {"selector": "css:.price", "multiple": True},
    },
)

When to use extract vs EvaluateAction

Use case	Tool
Data is visible in the HTML/DOM	`extract` (CSS/XPath selectors)
Data comes from JS variables	EvaluateAction
Data comes from internal APIs	EvaluateAction
Complex DOM logic needed	EvaluateAction

Basic extraction​

Multiple values​

Extract attributes​

XPath selectors​

Extraction + browser​

When to use extract vs EvaluateAction​